{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cathedral-typing",
   "metadata": {},
   "source": [
    "# Decision Tree\n",
    "\n",
    "Links to API References: [ClassificationTree](./python/api/ClassificationTree.ipynb), [RegressionTree](./python/api/RegressionTree.ipynb)\n",
    "\n",
    "*See the backing repository for Decision Tree [here](https://github.com/scikit-learn/scikit-learn).*\n",
    "\n",
    "<h2>Summary</h2>\n",
    "\n",
    "A supervised decision tree. This is a recursive partitioning method where the feature space is continually split into further partitions based on a split criteria. A predicted value is learned for each partition in the \"leaf nodes\" of the learned tree. This is a light wrapper to the decision trees exposed in `scikit-learn`. Single decision trees often have weak model performance, but are fast to train and great at identifying associations. Low depth decision trees are easy to interpret, but quickly become complex and unintelligible as the depth of the tree increases.  \n",
    "\n",
    "<h2>How it Works</h2>\n",
    "\n",
    "Christoph Molnar's \"Interpretable Machine Learning\" e-book [[1](molnar2020interpretable_dt)] has an excellent overview on decision trees that can be found [here](https://christophm.github.io/interpretable-ml-book/tree.html).\n",
    "\n",
    "For implementation specific details, scikit-learn's user guide [[2](pedregosa2011scikit_dt)] on decision trees is solid and can be found [here](https://scikit-learn.org/stable/modules/tree.html#tree).\n",
    "\n",
    "<h2>Code Example</h2>\n",
    "\n",
    "The following code will train an decision tree classifier for the breast cancer dataset. The visualizations provided will be for both global and local explanations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "surprising-smith",
   "metadata": {},
   "outputs": [],
   "source": [
    "from interpret import set_visualize_provider\n",
    "from interpret.provider import InlineProvider\n",
    "set_visualize_provider(InlineProvider())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "divine-science",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.datasets import load_breast_cancer\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import roc_auc_score\n",
    "\n",
    "from interpret.glassbox import ClassificationTree\n",
    "from interpret import show\n",
    "\n",
    "seed = 42\n",
    "np.random.seed(seed)\n",
    "X, y = load_breast_cancer(return_X_y=True, as_frame=True)\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)\n",
    "\n",
    "dt = ClassificationTree(random_state=seed)\n",
    "dt.fit(X_train, y_train)\n",
    "\n",
    "auc = roc_auc_score(y_test, dt.predict_proba(X_test)[:, 1])\n",
    "print(\"AUC: {:.3f}\".format(auc))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a3819596",
   "metadata": {},
   "outputs": [],
   "source": [
    "show(dt.explain_global())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3819597",
   "metadata": {},
   "outputs": [],
   "source": [
    "show(dt.explain_local(X_test[:5], y_test[:5]), 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "metropolitan-idaho",
   "metadata": {},
   "source": [
    "<h2>Further Resources</h2>\n",
    "\n",
    "- [Wikipedia on Decision Trees](https://en.wikipedia.org/wiki/Decision_tree_learning)\n",
    "- [scikit-learn on their Decision Tree module](https://scikit-learn.org/stable/modules/tree.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "supreme-prescription",
   "metadata": {},
   "source": [
    "<h2>Bibliography</h2>\n",
    "\n",
    "(molnar2020interpretable_dt)=\n",
    "[1] Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.\n",
    "\n",
    "(pedregosa2011scikit_dt)=\n",
    "[2] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. Scikit-learn: machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
